Pesquisa | Portal Regional da BVS

Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA.

Lemane, Téo; Lezzoche, Nolan; Lecubin, Julien; Pelletier, Eric; Lescot, Magali; Chikhi, Rayan; Peterlongo, Pierre.

Nat Comput Sci ; 4(2): 104-109, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38413777

RESUMO

Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as it is challenging to efficiently search them for any sequence(s) of interest. We present kmindex, an approach that can index thousands of metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. Here we demonstrate the scalability of kmindex by successfully indexing 1,393 marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server Ocean Read Atlas, which enables real-time queries on the Tara Oceans dataset.

Assuntos

Genômica , Água do Mar , Oceanos e Mares , Metagenoma/genética , Bases de Dados de Ácidos Nucleicos

The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies.

Willink, Beatriz; Tunström, Kalle; Nilén, Sofie; Chikhi, Rayan; Lemane, Téo; Takahashi, Michihiko; Takahashi, Yuma; Svensson, Erik I; Wheat, Christopher West.

Nat Ecol Evol ; 8(1): 83-97, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37932383

RESUMO

Sex-limited morphs can provide profound insights into the evolution and genomic architecture of complex phenotypes. Inter-sexual mimicry is one particular type of sex-limited polymorphism in which a novel morph resembles the opposite sex. While inter-sexual mimics are known in both sexes and a diverse range of animals, their evolutionary origin is poorly understood. Here, we investigated the genomic basis of female-limited morphs and male mimicry in the common bluetail damselfly. Differential gene expression between morphs has been documented in damselflies, but no causal locus has been previously identified. We found that male mimicry originated in an ancestrally sexually dimorphic lineage in association with multiple structural changes, probably driven by transposable element activity. These changes resulted in ~900 kb of novel genomic content that is partly shared by male mimics in a close relative, indicating that male mimicry is a trans-species polymorphism. More recently, a third morph originated following the translocation of part of the male-mimicry sequence into a genomic position ~3.5 mb apart. We provide evidence of balancing selection maintaining male mimicry, in line with previous field population studies. Our results underscore how structural variants affecting a handful of potentially regulatory genes and morph-specific genes can give rise to novel and complex phenotypic polymorphisms.

Assuntos

Odonatos , Animais , Feminino , Masculino , Odonatos/genética , Polimorfismo Genético , Genômica

decOM: similarity-based microbial source tracking of ancient oral samples using k-mer-based methods.

Duitama González, Camila; Vicedomini, Riccardo; Lemane, Téo; Rascovan, Nicolas; Richard, Hugues; Chikhi, Rayan.

Microbiome ; 11(1): 243, 2023 11 06.

Artigo em Inglês | MEDLINE | ID: mdl-37926832

RESUMO

BACKGROUND: The analysis of ancient oral metagenomes from archaeological human and animal samples is largely confounded by contaminant DNA sequences from modern and environmental sources. Existing methods for Microbial Source Tracking (MST) estimate the proportions of environmental sources, but do not perform well on ancient metagenomes. We developed a novel method called decOM for Microbial Source Tracking and classification of ancient and modern metagenomic samples using k-mer matrices. RESULTS: We analysed a collection of 360 ancient oral, modern oral, sediment/soil and skin metagenomes, using stratified five-fold cross-validation. decOM estimates the contributions of these source environments in ancient oral metagenomic samples with high accuracy, outperforming two state-of-the-art methods for source tracking, FEAST and mSourceTracker. CONCLUSIONS: decOM is a high-accuracy microbial source tracking method, suitable for ancient oral metagenomic data sets. The decOM method is generic and could also be adapted for MST of other ancient and modern types of metagenomes. We anticipate that decOM will be a valuable tool for MST of ancient metagenomic studies. Video Abstract.

Assuntos

Metagenoma , Metagenômica , Animais , Humanos , Metagenômica/métodos

k mdiff, large-scale and user-friendly differential k-mer analyses.

Lemane, Téo; Chikhi, Rayan; Peterlongo, Pierre.

Bioinformatics ; 38(24): 5443-5445, 2022 12 13.

Artigo em Inglês | MEDLINE | ID: mdl-36315078

RESUMO

SUMMARY: Genome wide association studies elucidate links between genotypes and phenotypes. Recent studies point out the interest of conducting such experiments using k-mers as the base signal instead of single-nucleotide polymorphisms. We propose a tool, kmdiff, that performs differential k-mer analyses on large sequencing cohorts in an order of magnitude less time and memory than previously possible. AVAILABILITYAND IMPLEMENTATION: https://github.com/tlemane/kmdiff. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Análise de Sequência de DNA , Estudo de Associação Genômica Ampla , Genótipo

The K-mer File Format: a standardized and compact disk representation of sets of k-mers.

Dufresne, Yoann; Lemane, Teo; Marijon, Pierre; Peterlongo, Pierre; Rahman, Amatur; Kokot, Marek; Medvedev, Paul; Deorowicz, Sebastian; Chikhi, Rayan.

Bioinformatics ; 38(18): 4423-4425, 2022 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-35904548

RESUMO

SUMMARY: Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3-5× compared to other formats, and bringing interoperability across tools. AVAILABILITY AND IMPLEMENTATION: Format specification, C++/Rust API, tools: https://github.com/Kmer-File-Format/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Análise de Sequência de DNA , Discos Compactos

kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections.

Lemane, Téo; Medvedev, Paul; Chikhi, Rayan; Peterlongo, Pierre.

Bioinform Adv ; 2(1): vbac029, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36699393

RESUMO

Summary: When indexing large collections of short-read sequencing data, a common operation that has now been implemented in several tools (Sequence Bloom Trees and variants, BIGSI) is to construct a collection of Bloom filters, one per sample. Each Bloom filter is used to represent a set of k-mers which approximates the desired set of all the non-erroneous k-mers present in the sample. However, this approximation is imperfect, especially in the case of metagenomics data. Erroneous but abundant k-mers are wrongly included, and non-erroneous but low-abundant ones are wrongly discarded. We propose kmtricks, a novel approach for generating Bloom filters from terabase-sized collections of sequencing data. Our main contributions are (i) an efficient method for jointly counting k-mers across multiple samples, including a streamlined Bloom filter construction by directly counting, partitioning and sorting hashes instead of k-mers, which is approximately four times faster than state-of-the-art tools; (ii) a novel technique that takes advantage of joint counting to preserve low-abundant k-mers present in several samples, improving the recovery of non-erroneous k-mers. Our experiments highlight that this technique preserves around 8× more k-mers than the usual yet crude filtering of low-abundance k-mers in a large metagenomics dataset. Availability and implementation: https://github.com/tlemane/kmtricks. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer.

Siekaniec, Grégoire; Roux, Emeline; Lemane, Téo; Guédon, Eric; Nicolas, Jacques.

Microb Genom ; 7(11)2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34812718

RESUMO

This study aimed to provide efficient recognition of bacterial strains on personal computers from MinION (Nanopore) long read data. Thanks to the fall in sequencing costs, the identification of bacteria can now proceed by whole genome sequencing. MinION is a fast, but highly error-prone sequencing device and it is a challenge to successfully identify the strain content of unknown simple or complex microbial samples. It is heavily constrained by memory management and fast access to the read and genome fragments. Our strategy involves three steps: indexing of known genomic sequences for a given or several bacterial species; a request process to assign a read to a strain by matching it to the closest reference genomes; and a final step looking for a minimum set of strains that best explains the observed reads. We have applied our method, called ORI, on 77 strains of Streptococcus thermophilus. We worked on several genomic distances and obtained a detailed classification of the strains, together with a criterion that allows merging of what we termed 'sibling' strains, only separated by a few mutations. Overall, isolated strains can be safely recognized from MinION data. For mixtures of several non-sibling strains, results depend on strain abundance.

Assuntos

Nanoporos , Streptococcus thermophilus , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Streptococcus thermophilus/genética , Sequenciamento Completo do Genoma

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA